Expert Systems with Applications — Latest Matching Preprints

1

Reinforcement learning optimization of automated insulin delivery in type 1 and type 2 diabetes mellitus

Lopez Palau, N. E.; Naranjo-Meneses, P.; Szendroedi, J.; Eils, R.; Kallenberger, S. M.

2025-10-14 endocrinology 10.1101/2025.10.12.25337835 medRxiv

Top 0.1%

28.9%

Show abstract

Closed-loop insulin delivery systems have proven effective in regulating blood glucose (BG) concentration, thereby reducing the burden of self-care in the management of type 1 and type 2 diabetes mellitus. However, the prevalence of unexpected disturbances resulting from oral glucose intake represents a considerable challenge to the full automation of these systems. Here, we propose an actor-critic reinforcement learning (RL) framework implemented within environments governed by compartmental ordinary differential equation models of the glucose-insulin-glucagon-incretins dynamics. This approach was employed to optimize automated insulin delivery in virtual patients with type 1 and type 2 diabetes mellitus, under scenarios involving unforeseen BG disturbances. The resulting optimal RL policies were tested in silico on virtual patients subjected to three unannounced glucose disturbances over the course of a day. The findings demonstrated that optimal RL policies could sustain the BG a significantly higher percentage of time within the normoglycemic range and a significantly lower percentage of time below the normoglycemic range in comparison to either continuous or discrete proportional-integral-derivative control algorithms. These results set the basis for developing new approaches to optimizing automated dosing regimens for chronic disease management. Author SummaryManaging diabetes requires constant attention to glucose levels and the corresponding adjustment of insulin doses, which can be demanding for insulin-dependent patients. Semi-automated systems, frequently referred to as "artificial pancreas", aim to mitigate this burden by adjusting insulin delivery based on glucose levels obtained from a continuous glucose sensor. However, these systems underperform in scenarios involving unexpected glucose disturbances, such as those triggered by the omission of meal announcing within the system interface. In the present study, we developed a computer-based learning approach to identify deep network-based functions that determine the appropriate insulin doses to regulate glucose in real time and mitigate unannounced disturbances. This approach has been demonstrated to be effective in the management of Type 1 and Type 2 diabetes and it does not require any additional input other than the continuous glucose measurements. The findings of our study demonstrate that our functions were able to maintain glucose levels within a healthy range for a longer period of time than the standard functions, while also reducing the risk of threating low glucose levels. This research contributes to the development of fully automated insulin delivery systems that are capable of adapting to real-life situations of individuals living with diabetes.

2

Predictive Modeling for Diabetes Using GraphLIME

Costi, F.; Onchis, D.; Hogea, E.; Istin, C.

2024-03-15 endocrinology 10.1101/2024.03.14.24304281 medRxiv

Top 0.1%

23.1%

Show abstract

The purpose of this paper is to present a detailed investigation of the advantages of employing GraphLIME (Local Interpretable Model Explanations for Graph Neural Networks) for the trustworthy prediction of diabetes mellitus. Our pursuit involves identifying the strengths of GraphLIME combined with the attention-mechanism over the standard coupling of deep learning neural networks with the original LIME method. The system build this way, provided us a proficient method for extracting the most relevant features and applying the attention mechanism exclusively to those features. We have closely monitored the performance metrics of the two approaches and conducted a comparative analysis. Leveraging attention mechanisms, we have achieved an accuracy of 92.6% for the addressed problem. The models performance is meticulously demonstrated throughout the study, and the results are furthermore evaluated using the Receiver Operating Characteristic (ROC) curve. By implementing this technique on a dataset of 768 patients diagnosed with or without diabetes mellitus, we have successfully boosted the models performance by over 18%.

3

A Tabular Residual Neural Network for Diabetes Classification and Prediction

Hammond, A.; Afridi, M.; Balakrishna, K.

2025-12-29 endocrinology 10.64898/2025.12.29.25343132 medRxiv

Top 0.1%

19.3%

Show abstract

Diabetes Mellitus (DM) is a metabolic disorder characterized by hyperglycemia, with type 1 characterized as an autoimmune destruction of pancreatic beta cells and type 2 characterized by insulin resistance with progressive beta cell dysfunction. This study applied an existing binary classification algorithm (ALTARN) to accurately predict DM. ALTARN, as a tabular attention residual neural network, uses residual connection to find complex patterns present in tabular columns. We achieved an average training accuracy of 75.22%. Furthermore, a robust set of validation metrics was obtained via five-fold stratified cross-validation, yielding an average accuracy of 74.61%, an average precision of 72.36%, a mean recall of 79.69%, and a mean F1 score of 75.83%.

4

Robust Feature Selection for Cancer Microarray Data Using a Hybrid mRMR and Binary Lion Optimization Algorithm

Sahu, B.; Panigrahi, A.; Abhilash Pati, A. P.; Madhavi, B. K.; Mishra, J.; Budhathoki, R. K.; Mallik, S.

2025-10-08 public and global health 10.1101/2025.10.07.25337478 medRxiv

Top 0.1%

18.1%

Show abstract

Microarray cancer datasets are characterized by a large number of irrelevant, redundant, and noisy features, which can severely hinder the accuracy and efficiency of classification algorithms. Feature selection, as a crucial branch of feature engineering, aims to enhance classification performance by identifying and retaining only the most informative features. However, feature selection is an NP-hard problem, where conventional search strategies are often prone to premature convergence and local optima, resulting in increased computational burden. To address these challenges, global metaheuristic algorithms have been widely explored. The recently proposed Lion Optimization (LO) algorithm has shown promising results for continuous optimization problems, yet its design is not inherently suited for discrete feature selection tasks. To overcome this limitation, a binary variant of the LO algorithm, termed Binary Lion Optimization (BLO), is introduced for wrapper-based feature selection in microarray cancer data analysis. In this work, the Minimum Redundancy Maximum Relevance (mRMR) criterion is first employed as a filter method to identify an initial subset of relevant features, thereby reducing search complexity. The refined feature subset is then optimized using the BLO algorithm to achieve improved classification outcomes. The proposed mRMR-BLO framework was evaluated on several widely recognized cancer microarray datasets and benchmarked against four state-of-the-art binary optimization algorithms. Experimental results demonstrate that mRMR-BLO consistently identifies smaller yet highly discriminative feature subsets, while achieving competitive or superior prediction accuracy. These findings highlight the potential of mRMR-BLO as an effective and robust tool for high-dimensional microarray cancer classification.

5

Leveraging AutoML to provide NAFLD screening diagnosis: Proposed machine learning models

Haider Bangash, A.

2020-10-22 endocrinology 10.1101/2020.10.20.20216291 medRxiv

Top 0.1%

15.3%

Show abstract

NAFLD is reported to be the only hepatic ailment increasing in its prevalence concurrently with both; obesity & T2DM. In the wake of a massive strain on global health resources due to COVID 19 pandemic, NAFLD is bound to be neglected & shelved. Abdominal ultrasonography is done for NAFLD screening diagnosis which has a high monetary cost associated with it. We utilized MLjar, an autoML web platform, to propose machine learning models that require no coding whatsoever & take in only easy-to-measure anthropometric measures for coming up with a screening diagnosis for NAFLD with considerably high AUC. Further studies are suggested to validate the generalization of the presented models.

6

Identification of Blood Glucose Patterns through Continuous Glucose Monitoring Sensors and DecisionTrees

Lozano Serrano, F. J.; Hidalgo Perez, J. I.; Botella Serrano, M.; Contador Pachon, S.; Lanchares Davila, J.; Velasco Cabo, J. M.; Garnica Alcazar, O.

2020-09-11 endocrinology 10.1101/2020.09.09.20190736 medRxiv

Top 0.1%

14.8%

Show abstract

The demand for Continuous Glucose Monitoring systems is increasing among type 1 diabetic patients. Some companies are trying to improve the monitoring and the usability of these systems. One example is Abbott FreeStyle Libre, which provides a new concept of glucose monitoring called Flash Glucose Monitoring which is more affordable and does not need calibration. The increasing demand for these devices means an opportunity for data and computer scientists, who can contribute to the development of decision-making support systems based on the data collected from the devices. Type 1 diabetic patients that use FreeStyle Libre may enter the number of insulin and carbohydrates units that they are going to take before a meal. Using both the entered data and the blood glucose values collected by the device automatically, the application presented in this paper generates a report of the patients glucose patterns. In addition, it provides a web application that allows the user to upload the data obtained from the device and download the report on his computer or smartphone. The application uses decision trees to detect the patterns and entails a starting point in the creation of ensemble models with more predictive power, also based on decision trees. Furthermore, the methodology makes a segmentation of the data set in blocks, determined by the different meals done throughout the day, adding more information to the set of variables used to train the model. As a result, the application can discover repetitive patterns in the daily life of the patient, which can help to take early preventive measures for risk situations in a period close to the next meal.

7

Estimated Average Glucose Integration (eAGi) and A1c Prime (A1c'): A Novel Algorithm for Estimating Glycemic Control Over Time

Koltz, C. R.

2025-06-28 endocrinology 10.1101/2025.06.26.25330375 medRxiv

Top 0.1%

14.5%

Show abstract

Type 2 diabetes affects hundreds of millions worldwide and is associated with significant adverse health outcomes. Optimizing glycemic control mitigates risk over time and has been shown to improve cardiovascular outcomes, reduce microvascular injury as well as decrease overall mortality. Diabetes-specific risk is determined not only by glycemic control but duration of exposure which has been historically difficult to quantify in aggregate. The model proposed in this paper computes two long-term diabetes control scores using a novel algorithm that integrates estimated average glucose as a function of time. The calculated scores are called Estimated Average Glucose Integration (eAGi) and A1c Prime (A1c). Based on previous studies demonstrating that poor glycemic control over time is associated with worse outcomes, eAGi and A1c are anticipated to provide a useful, diabetes-specific hazard appraisal for individuals in both clinical and research settings.

8

Developing a GraphRAG-enabled local-LLM for Gestational Diabetes Mellitus.

Sharma, R.

2025-04-30 endocrinology 10.1101/2025.04.28.25326568 medRxiv

Top 0.1%

14.4%

Show abstract

This paper re-imagines a world of abundance in the treatment of chronic diseases such as Tpe 2 Diabetes. It asks: what if preventive and diagnostic remedies were widely made available across the world, informed by the latest medical research? As Proof-of-Concept of a proposed solution, the paper describes the development and validation of a local Large Language Models (local-LLMs) based on Graph-based Retrieval-Augmented Generation (GraphRAG) for managing Gestational Diabetes Mellitus (GDM). The research thus seeks new insights into optimizing GDM treatment through a knowledge graph architecture, contributing to a deeper understanding of how artificial intelligence can extend medical expertise to underserved populations globally. The study employs an agile, prototyping approach utilizing GraphRAG to enhance knowledge graphs by integrating retrieval-based and generative artificial intelligence techniques. Training data was from academic papers published between January 2000 and May 2024 using the Semantic Scholar API and analyzed by mapping complex associations within GDM management to create a comprehensive knowledge graph architecture. It is categorically stated that, since the primary research objective was to establish the feasibility of a GraphRAG local-LLM PoC, no human subjects nor actual patient datasets were used. Empirical results indicate that the GraphRAG-based Proof of Concept outperforms open-source LLMs such as ChatGPT, Claude, and BioMistral across key evaluation metrics. Specifically, GraphRAG achieves superior accuracy with BLEU scores of 0.99, Jaccard similarity of 0.98, and BERT scores of 0.98, offering significant implications for personalized medical insights that enhance diagnostic accuracy and treatment efficacy. This research offers a novel perspective on applying GraphRAG-enabled LLM technologies to GDM management, providing valuable insights that extend current understanding of AI applications in healthcare. The studys findings contribute to advancing the feasibility of GenAI for proactive GDM treatment and extending medical expertise to underserved populations globally.

9

Leptin augmented model to include the role of obesity in insulin-glucose regulatory system for T2DM subjects

ARAVINDAKSHAN, M. R.; Ghosh, D.; Mandal, C.; Sarkar, J.; Maity, S. K.; Chakrabarti, P.

2024-06-19 endocrinology 10.1101/2024.06.18.24309097 medRxiv

Top 0.1%

12.6%

Show abstract

Leptin is a fat cell-derived hormone involved in satiety and body weight regulation. It also plays a critical regulatory role in the insulin-glucose regulatory system by modulating glucose metabolism and energy homeostasis. However, existing insulin-glucose models often fail to consider the impact of body weight indicators mainly body mass index (BMI) and plasma leptin. To address this limitation, we propose augmenting the ordinary differential equations (ODE) of the Oral Minimal Model (OMM) with an additional equation, incorporating leptin as well as supplementary terms and parameters. By estimating the model parameters, the model behaviour is aligned with the observed data of glucose, insulin and leptin for individuals with type 2 diabetes mellitus (T2DM). Based on model behaviour, revised indices formulated from Oral Glucose Tolerance Test (OGTT) data by including BMI and fasting leptin values are found to have a better correlation with existing indices. Additionally, parameter sensitivity analysis is performed to investigate the influence of the model parameters on the observed variables. Validation of the augmented model with clinical data (without leptin) demonstrates a superior fit to glucose and insulin data compared to the base model. This model emphasizes the intricate associations between leptin, glucose, and insulin concentrations with a potential for developing targeted interventions and therapies for T2DM. Notably, this manuscript introduces the first ODE-based model that incorporates leptin and BMI in the insulin-glucose pathway.

10

Predicting Common Pathway Signatures Between DNA Methylation and Post Translational Modification in Type II Diabetes & Parkinson's Disease Using Heterogeneous Data Integration

Biswas, S.; Mitra, P.; Rao, K. S.

2024-09-27 endocrinology 10.1101/2024.09.26.24314438 medRxiv

Top 0.1%

12.5%

Show abstract

The complex diseases, namely, Type 2 Diabetes Mellitus (T2DM) and Parkinsons Disease (PD), are extensively studied due to their prevalence in a large population group. Between these two diseases, T2DM is denoted as the zero index disease in a patient, which may lead to PD in a more advanced clinical stage. Both of these diseases may occur due to abrupt DNA methylation of genes. Likewise, both diseases may occur in a patient due to protein misfolding. Our study proposes a novel framework for building two disease-specific heterogeneous networks by integrating different tissue-based transcriptomics, epigenetics, epistasis, and PPI-based topological information. We predict the missing links between the DNA methylation and Post-Translational Modification (PTMs) associated with protein aggregation. Next, we have predicted the common signature of the prevalence of linked patterns in both diseases, further validated by relevant biological evidence.

11

Enhanced Diabetes Prediction Using Novel Additive-Multiplicative Neural Networks: A Comprehensive Machine Learning Analysis of the PIMA Indians Dataset

Demirel, S.; Aytekin, K.; agraz, m.

2025-09-22 endocrinology 10.1101/2025.09.20.25336250 medRxiv

Top 0.1%

12.1%

Show abstract

BackgroundEarly diabetes detection remains challenging, requiring robust machine learning approaches that balance accuracy with clinical interpretability for effective diagnostic support. MethodsWe are proposing a novel Additive and Multiplicative Neurons Network (AMNN) that combines both additive and multiplicative computational pathways to capture complex nonlinear relationships in diabetes prediction. Using the PIMA Indians Diabetes dataset (n=768), we compared AMNN against nine established algorithms including XGBoost, KAN, and traditional neural networks. Data preprocessing included SMOTE oversampling for class imbalance, and model interpretability was enhanced through SHAP and LIME explainable AI techniques. ResultsThe AMNN model outperformed all baseline approaches, achieving 75.76% accuracy, a 76.18% F1-score, and an AUC-ROC of 0.8206. Across both traditional feature selection techniques and explainable AI analyses, glucose levels, BMI, age, and pregnancy count consistently emerged as the most influential predictors. ConclusionsThe AMNN framework demonstrates strong potential for diabetes prediction by balancing accuracy with clinical interpretability. The key predictors it highlights align closely with established medical knowledge, reinforcing confidence in its outputs and suitability for use in clinical decision-making workflows. This hybrid neural network approach represents a promising step toward transparent, AI-assisted diagnostic tools that can support healthcare professionals in practice.

12

Individualized Therapy Optimization for Type 2 Diabetes

Lozhkina, A.; Piazza, C. D.; Gabr, Z.; Rupp, M.; Herzig, D.; Bally, L.; Jaun, A.

2025-10-15 endocrinology 10.1101/2025.10.13.25337959 medRxiv

Top 0.1%

10.7%

Show abstract

Background and AimsType 2 diabetes is a wide-spread chronic condition in which blood glucose and body weight management constitute essential therapeutic targets. Emerging technologies have the potential to aid complex therapeutic pharmacotherapy choices that are optimally tailored to individual needs. Here we propose an artificial intelligence combining guidelines with clinical features and continuous glucose monitoring (CGM) to optimize therapeutic decision-making. MethodsTherapeutic guidelines are first encoded using a rule-based model and trained on a neural network. Relying on real world evidence outcomes of a specialist outpatient clinic, transfer learning is used to optimize for glucose-lowering therapies that led to successful treatment outcomes defined as an absolute 0.3% reduction in glycated (HbA1c) over 6.5% without increasing body weight for a BMI over 28. Recommendations that deviate from guidelines are described with Shapley values and tested in digital twins for statistical significance. Four CGM-derived glucose-insulin response dynamic factors serve as additional biomarkers. ResultsDual glycemic & weight targets were achieved in actual clinical practice in 51% cases, increasing to 54% when clinical guidelines were followed. Selecting outcomes in the test set that follow individualized recommendations, this increases further to 56% when using only phenotypic markers and to 64% when adding CGM-derived dynamics factors. ConclusionsTested on the limited patient number available, our findings show that AI can outperform guidelines in complex type 2 diabetes cases by integrating multiple data sources, drawing on experiential clinical insights, and selecting treatments most effective for each patients glucose and weight control. One-linerA neural network is first trained on guidelines and subsequently on real-world evidence outcomes, performing dual glycaemic/weight optimization to improve the management of type 2 diabetes with/out gluco-dynamic parameters extracted from Continuous Glucose Monitoring.

13

Tracking and Classifying Global COVID-19 Cases by using 1D Deep Convolution Neural Network

Amo-Boateng, M.

2020-06-12 public and global health 10.1101/2020.06.09.20126565 medRxiv

Top 0.1%

10.4%

Show abstract

The novel coronavirus disease (COVID-19) and pandemic has taken the world by surprise and simultaneously challenged the health infrastructure of every country. Governments have resorted to draconian measures to contain the spread of the disease despite its devastating effect on their economies and education. Tracking the novel coronavirus 2019 disease remains vital as it influences the executive decisions needed to tighten or ease restrictions meant to curb the pandemic. One-Dimensional (1D) Convolution Neural Networks (CNN) have been used classify and predict several time-series and sequence data. Here 1D-CNN is applied to the time-series data of confirmed COVID-19 cases for all reporting countries and territories. The model performance was 90.5% accurate. The model was used to develop an automated AI tracker web app (AI Country Monitor) and is hosted on https://aicountrymonitor.org. This article also presents a novel concept of pandemic response curves based on cumulative confirmed cases that can be use to classify the stage of a country or reporting territory. It is our firm believe that this Artificial Intelligence COVID-19 tracker can be extended to other domains such as the monitoring/tracking of Sustainable Development Goals (SDGs) in addition to monitoring and tracking pandemics.

14

FCFNets: A Factual and Counterfactual Learning Framework for Enhanced Hepatic Fibrosis Prediction in Young Adults with T2D

Yang, Q.; Sharma, A.; Calin, D.; de Crecy, C.; Inampudi, R.; Yin, R.

2025-03-12 endocrinology 10.1101/2025.03.07.25323577 medRxiv

Top 0.1%

10.0%

Show abstract

Hepatic fibrosis poses a significant health risk for young adults with type 2 diabetes (T2D). We propose FCFNets, a novel factual and counterfactual learning framework to predict hepatic fibrosis in young adults with T2D that can address class imbalance issue and increase interpretability leveraging electronic health records (EHRs). We designed a hybrid UNDO oversampling strategy, combining random and dissimilar oversampling that improves dataset diversity and model robustness. FCFNets also integrates SHAP-based global and instance-level explanations, alongside feature interaction analysis, providing insights into critical risk factors associated with hepatic fibrosis. The results show our proposed model outperforms various baseline methods with high sensitivity (0.846) and accuracy (0.768), while delivering counterfactual explanations. Hyperparameter tuning and dropout analysis further refine the model, ensuring optimal performance. This study demonstrates FCFNetss potential for early detection and personalized management of hepatic fibrosis, paving the way for interpretable AI applications in precision medicine.

15

Daily Rhythms in Blood Glucose: Time-of-Day Forecasts in Type 2 Diabetes

Pedersen, N. P. B.; Bugaksji, T. K.; Vera-Valdes, J. E.; Casper, S. H.; Jensen, M. H.; Vestergaard, P.; Kronborg, T.

2025-06-04 endocrinology 10.1101/2025.06.02.25328803 medRxiv

Top 0.1%

10.0%

Show abstract

BackgroundAccurate and interpretable forecasting of blood glucose levels is critical for effective manage- ment of Type 2 diabetes. While complex machine learning models offer high predictive accuracy, their opacity often limits clinical applicability. This study investigates the perfor- mance of a simple, interpretable reference model: the time-of-day mean forecast. MethodThe proposed approach divides each 24-hour period into discrete time sequences and, for each sequence, computes the mean glucose value across previous days. This methodology captures intra-day regularities in glucose dynamics and implicitly accounts for circadian influences, such as variations in insulin sensitivity and hepatic glucose production. ResultsThe model reflects intra-day glucose patterns and identifies clinically relevant periods of elevated variability, such as the postprandial and nocturnal windows. Forecasting performance improves with increased temporal granularity: in 91.84% of the individuals, at least one finer bin size outperformed the naive baseline. Where, 51% achieved optimal performance using the highest resolution with a 5-minute bin size. Compared to the naive approach, the 5-minute bin size reduced mean squared error by an average of 12.2%. ConclusionsWe have justified the time-of-day approach using a simple mean forecast model, showing that aligning prediction windows with time-of-day patterns enhances forecast accuracy. Building on this foundation, the time-of-day mean forecast serves as a practical benchmark. Future work should explore more complex models that incorporate individual covariates and dynamic temporal dependencies, while maintaining interpretability using the described temporal structure.

16

SoK: Intelligent Detection for Polycystic Ovary Syndrome(PCOS)

Li, M.; He, Z.; shi, l.; Lin, M.; Li, M.; Cheng, Y.; xue, l.; Liu, H.; Nie, L.

2024-12-28 endocrinology 10.1101/2024.12.25.24319623 medRxiv

Top 0.1%

8.4%

Show abstract

O_FIG O_LINKSMALLFIG WIDTH=146 HEIGHT=200 SRC="FIGDIR/small/24319623v1_ufig1.gif" ALT="Figure 1"> View larger version (44K): org.highwire.dtl.DTLVardef@805345org.highwire.dtl.DTLVardef@db004dorg.highwire.dtl.DTLVardef@1f0cbfaorg.highwire.dtl.DTLVardef@1dfb41d_HPS_FORMAT_FIGEXP M_FIG O_FLOATNOGraphical AbstractC_FLOATNO C_FIG HighlightsO_LIConducted a systematic review of the existing literature, focusing on Polycystic Ovary Syndrome intelligent detection, and constructed the comprehensive taxonomy for PCOS detection features to date, providing a standardized reference for future research. C_LIO_LISystematically evaluated the capabilities and limitations of current intelligent PCOS detection tools, offering valuable guidance for the development of more efficient and accurate tools. C_LIO_LIThoroughly analyzed the current status of 12 publicly available datasets used for PCOS detection, providing clear directions for future dataset development in this field. C_LIO_LIMade the analysis results publicly available, providing data resources and references for researchers, with the aim of advancing the field of intelligent PCOS detection. C_LI Recent research in the field of Polycystic Ovary Syndrome (PCOS) detection has increasingly utilized intelligent algorithms for automated diagnosis. These intelligent PCOS detection methods can assist doctors in diagnosing patients earlier and more efficiently, thereby improving the accuracy of diagnosis. However, there are notable barriers in the field of intelligent PCOS detection, including the lack of a standardized taxonomy for features, inadequate research on the current status of available datasets, and insufficient understanding of the capabilities of existing intelligent detection tools. To overcome these barriers, we propose for the first time an analytical framework for the current status of PCOS diagnostic research and construct a comprehensive taxonomy of detection features, encompassing 110 features across eight categories. This taxonomy has been recognized by industry experts. Based on this taxonomy, we analyze the capabilities of current intelligent detection tools and assess the status of available datasets. The results indicate that 12 publicly available datasets, the overall coverage rate is only 52% compared to the known 110 features, with a lack of multimodal datasets, outdated updates and unclear license information. These issues directly impact the detection capabilities of the tools. Furthermore, among the 45 detection tools require substantial computational resources, lack multimodal data processing capabilities, and have not undergone clinical validation. Based on these findings, we highlight future challenges in this domain. This study provides critical insights and directions for PCOS intelligent detection field.

17

Transfer learning applied to the forecast of mosquito-borne diseases

Coelho, F. C.; de Holanda, N. L.; Coimbra, B. M.

2020-02-04 public and global health 10.1101/2020.02.03.20020164 medRxiv

Top 0.1%

7.5%

Show abstract

Here we apply the concept of transfer learning to time series forecasting models for mosquito-borne diseases. Transfer learning, in this application, allows us to use knowledge obtained from modeling one disease to predict an emerging one for which extensive data is still not available. Here we discuss the performances of two families of models for predicting Chikungunya and Zika using models trained with dengue time series, in two Brazilian cities: Rio de Janeiro and Fortaleza.

18

Deep Learning for Screening COVID-19 using Chest X-Ray Images

Basu, S.; Mitra, S.; Saha, N.

2020-05-08 radiology and imaging 10.1101/2020.05.04.20090423 medRxiv

Top 0.1%

7.1%

Show abstract

AO_SCPLOWBSTRACTC_SCPLOWWith the ever increasing demand for screening millions of prospective "novel coronavirus" or COVID-19 cases, and due to the emergence of high false negatives in the commonly used PCR tests, the necessity for probing an alternative simple screening mechanism of COVID-19 using radiological images (like chest X-Rays) assumes importance. In this scenario, machine learning (ML) and deep learning (DL) offer fast, automated, effective strategies to detect abnormalities and extract key features of the altered lung parenchyma, which may be related to specific signatures of the COVID-19 virus. However, the available COVID-19 datasets are inadequate to train deep neural networks. Therefore, we propose a new concept called domain extension transfer learning (DETL). We employ DETL, with pre-trained deep convolutional neural network, on a related large chest X-Ray dataset that is tuned for classifying between four classes viz. normal, other_disease, pneumonia and Covid -- 19. A 5-fold cross validation is performed to estimate the feasibility of using chest X-Rays to diagnose COVID-19. The initial results show promise, with the possibility of replication on bigger and more diverse data sets. The overall accuracy was measured as 95.3% {+/-} 0.02. In order to get an idea about the COVID-19 detection transparency, we employed the concept of Gradient Class Activation Map (Grad-CAM) for detecting the regions where the model paid more attention during the classification. This was found to strongly correlate with clinical findings, as validated by experts.

19

In silico Antibody-Peptide Epitope prediction for Personalized cancer therapy

Jacobs, I.; Lim, C. M.; Emmanoil, M.; Malik, N.

2023-01-23 cancer biology 10.1101/2023.01.23.525181 medRxiv

Top 0.1%

7.1%

Show abstract

The human leukocyte antigen (HLA) system is a complex of genes on chromosome 6 in humans that encodes cell-surface proteins responsible for regulating the immune system. Viral peptides presented to cancer cell surfaces by the HLA trigger the immune system to kill the cells, creating Antibody-peptide epitopes (APE). This study proposes an in-silico approach to identify patient-specific APEs by applying complex networks diagnostics on a novel multiplex data structure as input for a deep learning model. The proposed analytical model identifies patient and tumor-specific APEs with as few as 20 labeled data points. Additionally, the proposed data structure employs complex network theory and other statistical approaches that can better explain and reduce the black box effect of deep learning. The proposed approach achieves an F1-score of 80% and 93% on patients one and two respectively and above 90% on tumor-specific tasks. Additionally, it minimizes the required training time and the number of parameters.

20

Deep learning generates custom-made logistic regression models for explaining how breast cancer subtypes are classified

Shibahara, T.; Wada, C.; Yamashita, Y.; Fujita, K.; Sato, M.; Okamoto, A.; Ono, Y.

2021-05-11 cancer biology 10.1101/2021.05.10.443518 medRxiv

Top 0.1%

6.8%

Show abstract

Differentiating the intrinsic subtypes of breast cancer is crucial for deciding the best treatment strategy. Deep learning can predict the subtypes from genetic information more accurately than conventional statistical methods, but to date, deep learning has not been directly utilized to examine which genes are associated with which subtypes. To clarify the mechanisms embedded in the intrinsic subtypes, we developed an explainable deep learning model called a point-wise linear (PWL) model that generates a custom-made logistic regression for each patient. Logistic regression, which is familiar to both physicians and medical informatics researchers, allows us to analyze the importance of the feature variables, and the PWL model harnesses these practical abilities of logistic regression. In this study, we show that analyzing breast cancer subtypes is clinically beneficial for patients and one of the best ways to validate the capability of the PWL model. First, we trained the PWL model with RNA-seq data to predict PAM50 intrinsic subtypes and applied it to the 41/50 genes of PAM50 through the subtype prediction task. Second, we developed a deep enrichment analysis method to reveal the relationships between the PAM50 subtypes and the copy numbers of breast cancer. Our findings showed that the PWL model utilized genes relevant to the cell cycle-related pathways. These preliminary successes in breast cancer subtype analysis demonstrate the potential of our analysis strategy to clarify the mechanisms underlying breast cancer and improve overall clinical outcomes.